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Abstract. This article extends the idea of solving parity games by strategy it- 
eration to non-deterministic strategies: In a non-deterministic strategy a player 
restricts himself to some non-empty subset of possible actions at a given node, 
instead of limiting himself to exactly one action. 

We show that a strategy-improvement algorithm by by Bjorklund, Sandberg, 
and Vorobyov (3) can easily be adapted to the more general setting of non- 
deterministic strategies. Further, we show that applying the heuristic of "all prof- 
itable switches" (cf. UJ) leads to choosing a "locally optimal" successor strategy 
in the setting of non-deterministic strategies, thereby obtaining an easy proof of 
an algorithm by Schewe 1131 . 

In contrast to (3), we present our algorithm directly for parity games which allows 
us to compare it to the algorithm by Jurdzinski and Voge 1151 : We show that the 
valuations used in both algorithm coincide on parity game arenas in which one 
player can "surrender". Thus, our algorithm can also be seen as a generalization 
of the one by Jurdzinski and Voge to non-deterministic strategies. 
Finally, using non-deterministic strategies allows us to show that the number of 
improvement steps is bound from above by 0(1.724" ). For strategy-improvement 
algorithms, this bound was previously only known to be attainable by using ran- 
domization (cf. 171). 

1 Introduction 

A parity game arena consists of a directed graph G = (V, E) where every vertex be- 
longs to exactly one of two players, called player and player 1. Every vertex is colored 
by some natural number in {0, 1, ... , d— 1}. Starting from some initial vertex vq, a play 
of both players is an infinite path in G where the owner of the current node determines 
the next vertex. In a parity game, the winner of such an infinite play is then defined by 
the parity of the maximal color which appears infinitely often along the given play. 

As shown by Mostowski [111 , and independently by Emerson and Jutla j4], there 
exists a partition of V in two sets Wo and W\ such that player i has a memoryless 
strategy, i.e. a map <Tj : Vl — ► V which maps every vertex v controlled by player % to 
some successor v, so that player i wins any play starting from some w £ Wi by using 
Cj to determine his moves. 

Interest in parity games arises as determining the winning set Wo is equivalent to 
deciding whether a given ^i-calculus formula holds w.r.t. to a given Kripke structure, 
i.e. determining Wq is equivalent to the model checking problem of /i-calculus. Further 



interest is sparked as it is known that solving parity games is in UPflco-UP (8), but no 
polynomial time algorithm has been found yet. 

In this article we consider an approach for calculating the winning sets which is 
known as strategy iteration or strategy improvement, and can be described as follows 
in the setting of games: In a first step, a way for valuating the strategies of player is 
fixed, thereby inducing a partial order on the strategies of player 0. Then, one chooses 
an initial strategy a : Vq — > V for player 0. Iteratively (i) the current strategy is valu- 
ated, (ii) by means of this valuation possible improvements of the current strategy are 
determined, i.e. pairs (u, v) such that a[u i— > v] is a strategy having a better valuation 
than er, (iii) a subset of the possible improvements is selected and implemented yielding 
a better strategy a 1 : Vq — » V. These steps are repeated until no improvements can be 
found anymore. 

Although this approach usually (using no randomization U) allows only to give a 
bound exponential in \Vq\ on the number of iterations needed till termination, there is 
no family of games known for which this approach leads to a super-polynomial number 
of improvement steps. It is thus also used in practice e.g. in compilers fl4*l . 

In particular, this approach has been successfully applied in several different sce- 
narios like Markov decision processes J6), stochastic games |5 |, or discounted pay- 
off games lfl2l . Using reductions, these algorithms can also be used for solving par- 
ity games. In 2000 Jurdzinski and Voge lfT31 presented the first strategy-improvement 
algorithm for parity games which directly works on the given parity game without re- 
quiring any reductions to some intermediate representation. Although the algorithm by 
Jurdzinski and Voge did not lead to a better upper bound on the complexity of decid- 
ing the winner of a parity game with n nodes and d colors (the algorithm in j 15] has a 
complexity of 0((n/d) d ) whereas the upper bound of 0{{n/d) d / 2 ) was already known 
at that time 0), it sparked a lot of interest as the strategy-improvement process w.r.t. 
parity games is directly observable and not obfuscated by some reduction. 

In this article, we extend strategy iteration to non-deterministic strategies: In a non- 
deterministic strategy a player is not required to fix a single successor for any vertex 
controlled by him instead he restricts himself to some non-empty subset of all possible 
successors. Using non-deterministic strategies seems to be more natural, as it allows a 
player to only "disable" those moves along which the valuation of the current strategy 
decreases. Our algorithm is an extension of an algorithm by Bjorklund, Sandberg, and 
Vorobyov |!3I] proposed in 2004. In particular, we borrow their idea of giving one of 
the two players the option to give up and "escape" an infinite play he would lose by 
introducing a sink. In contrast to the original algorithm in [3i] we present this extended 
algorithm directly for parity games in order to be able to compare this algorithm directly 
with the one by Jurdzinski and Voge, and also in the hope that this might lead to better 
insights regarding the strategy improvement process. 

Strategy iteration, as described above, chooses in step (iii) some subset of possible 
changes in order to obtain the next (deterministic) strategy. A natural question is how 
to choose this set of changes. Obviously, one would like to choose these sets in such 
a way that the total number of improvements steps is as small as possible - we call 
this "globally optimal". As no efficient algorithm for determining these sets is known, 
usually heuristics are used instead. One heuristic applied quite often in the case of a 
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binary arena is called "all profitable switches" (TJ: In a binary arena, given a strategy 
a : Vo — ► V we can refer to the successors of v € Vb by a(v) and u(«). A strategy 
improvement step then amounts to deciding for every node v € Vq whether to switch 
from cr(v) to cr(v), or not. "All possible switches" refers then to the heuristic of switch- 
ing to <j{v) of every v £ V if this switch is an improvement w.r.t. the used valuation. 
Transferring this heuristic to the setting of non-deterministic strategies the heuristic be- 
comes simply to choose the set of all possible improvements of the given strategy as 
the new strategy considered in the next step. We show that this simple heuristic leads 
to the "locally optimal" improvement, i.e. the strategy which is at least as good as 
any other strategy obtainable by implementing a subset of the possible improvements. 
By applying this heuristic in every step we obtain a new, in our opinion more natural 
and accessible, presentation of the algorithm by Schewe proposed in fPUl : There only 
valuations (referred to as "estimations" there), and deterministic strategies are consid- 
ered, whereas the strategy improvement process itself, and the connection to O are 
obfuscated. Further, the algorithm in lfl3l does not work directly on parity games, and 
requires some unnecessary restrictions on the graph structure of the arena, e.g. only 
bipartite arenas are considered. 

We then compare our algorithm using non-deterministic strategies to the one by 
Jurdzinski and Voge [15]. This is not possible w.r.t. the algorithm in or lfl3l as these 
do not work directly on parity games. Here, we can show that the valuation used in our 
algorithm, resp. in [15] coincide, which readily allows us to conclude that the locally 
optimal improvement obtained by our algorithm is always at least as good as any locally 
improvement obtainable by |[T5l . 

We obtain an upper bound of 0{\V\ 2 ■ \E\ ■ (M + l) d ) for our algorithm which 
is the same as the one obtainable when using deterministic strategies |3j. So using 
non-deterministic strategies comes "for free". Of course, w.r.t. to the sub-exponential 

bound of |y| 0( ^"^~^ obtainable for the algorithm by Jurdzinski, Paterson and Zwick 
(3, our algorithm is not competitive. Still, we think that our algorithm is interesting as 
strategy-iteration in practice only requires a polynomial number of improvement steps 
in general, as already mentioned above. In particular, we can show that the number 
of improvement steps done by our algorithm when using the "all profitable switches"- 
heuristic, and thus by the one by Schewe 0~3), is bounded by 0(1.724'^°' ), whereas the 
best known upper bound for strategy iteration when using only deterministic strategies 
and no randomization in the improvement selection is 0(2' Vi) l/|Vb|) AD- In particular, 
the bound of <3(1.724l v °l) was previously known to be obtainable only be choosing the 
improvements randomly HI . 

Organization: Section 2 summarizes the standard definitions and results regarding par- 
ity games. In Section 3 we extend parity games by allowing player to terminate infinite 
plays in order to escape an infinite play he would lose. This idea was first stated in 15] . 
We combine this with a generalization of the path profiles used in lfT31 in order to get 
an algorithm working directly on parity games. Section 4 summarizes our strategy im- 
provement algorithm using non-deterministic strategies. Section 5 then compares the 
algorithm presented in this article with the one by Jurdzinski and Voge. 
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2 Preliminaries 



In this section we repeat the standard definitions and notations regarding parity games. 
An arena A is given by (V, E, 6), if (V, E) is a finite, directed graph, where o : 

V — ► {0, 1} assigns each node an owner. We denote by Vi := o _1 (i) the set of all 
nodes belonging to player i G {0, 1}, and write E% for E n Vi x V. Given some subset 
V' C V we write A\v for the restriction of the arena A to the nodes V'. A play 
7r G V N U y* in *4 is any maximal path in A where we assume that player i determines 
the move (tt(i), 7r(i + 1)), if 7r(i) G V,, For (V, E) a directed graph, and s G V a node 
we write s.E for the set of successors of s. 

For A = (V, E, o) an arena, a (memoryless) strategy of player i (short: i-strategy) 
(i G {0, 1}) is any subset er C E t satisfying Vs G Vi : \sE\ > =*> |ser| > 0, i.e. a 
strategy does not introduce any new dead ends, er is deterministic, if |ser| < 1 for all 
seVi. We write £ CT for E a = E\_i U er, and for (V, o). 

We assume that the reader is familiar with the concept of attractors. For conve- 
nience, a definition can be found in the appendix. 

A parity game arena A is given by (V, E, o, c) where (V, E, o) is an arena with 
vE 7^ for all v G V, and c : V — ► {0, 1, . . . , d — 1} assigns each node a color. 
The winner of a play 7r in a parity game arena is given by limsup igN c(n(i)) (mod 2). 
Given a node s, a strategy er G £?j is a winning strategy for s of player i, if he wins 
any play in A\ a starting from s. Player i wins a node s, if he has a winning strategy 
for it. Wi denotes the set of nodes won by player i. As we assume that every node has 
at least one successor, there are only infinite plays in a parity game arena. Wlog., we 
further assume that c _1 (fc) ^ for all k G {0, 1, . . . , d — 1} as we may otherwise 
reduce d, A cycle sqSi . . . s„_i (with s i+1 ( mo( j „) G SiE) in a parity game arena A is 
called i-dominated, if the parity of its highest color is i. Player i wins the node s using 
strategy er C Vi X V, iff every cycle reachable from s in A\ a is i-dominated. 

Theorem 1. 477141/ For any a parity game arena A we have Wo U W\ = V. Player i 
possesses a deterministic strategy er* : Vi — > V with which he win every node s G Wi. 

3 Escape Arenas 

In this section we extend parity games by allowing player to escape an infinite play 
which he would loose w.r.t. the parity game winning condition: 

Let A= (V, E, o) be a parity game arena. We obtain the arena = (V± , E± , o± ) 
from A by introducing a sink _L V± := V"ttl{_L} where only player can choose to play 
to _L (Ex ■= E U Vq x {-L}). The sink _L itself has no out-going edges, and we assume 
that player controls _L (o± := o U {(_L,0)} although this is of no real importance. 
Although, this construction was first proposed in [3 1 we refer to A± as escape arena in 
the style of iTPJl . As A± itself is no parity game arena anymore, we have to define the 
winner of such a finite play as well. For this we extend the definition of color profile, 
which was first stated in |2), to finite plays: 

For a given escape arena A± using d colors {0, 1, . . . , d — 1}, we define the set 

V of color profiles by? 3 := Z d U{— 00,00} where Z d is the set of ef-dimensional 
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integer vectors. We write for the zero-profile (0, 0, . . . , 0) £ Z d , and use standard 
addition on Z d for two profiles p, p' £ X d . The idea of a profile p £ V is to count 
how often a given color appears a long a finite play, whereas — oo, reps, oo correspond 
to infinite plays won by player 1, resp. player 0. More precisely, for a finite sequence 
tt = sqSi ... si of vertices, the value p(ir) of tt is the profile which counts how often 
a color k £ {0, 1, . . . , d — 1} appears in c(so)c(si) . . . c(sj). For an infinite sequence 
7r = s si . . ., its value p(%) is defined to be oo, if tt is won by player w.r.t. the parity 
game winning condition; otherwise p(jr) := — oo. Finally, we introduce a total order -< 
on V which tries to capture the notion of when one of two given plays is better than the 
other for player 0: For this we set (i) — oo to be the bottom element of -<, (ii) oo to be 
the top element of -<, and (iii) for all p, p' £ V \ {— oo, oo} we set: 

p -< p' 3k £ {0, 1, . . . , d - 1} : k = max{fc £ {0, 1, . . . , d - 1} | p k ^ p' fc } 

A (k = 2 A p k < pj, V k =2 1 A pk > p'k) ■ 

Informally, the definition of -< says that player hates to loose in an infinite play, 
whereas he likes it the most to win an infinite play. So, whenever he can, he will try 
to escape an infinite play he cannot win, therefore resulting in a finite play to _L: here, 
given two finite plays tt\, ir 2 ending in _L, player looks for the highest color c which 
does not appear equally often along both plays. If c is even, he prefers that play in which 
it appears more often; if it is odd, he prefers the one in which it appears less often. In 
particular, player dislikes visiting odd-dominated cycle, while he likes visiting even- 
dominated ones: 

Lemma 1. Assume that x = so s i ■ ■ • s n is a non-empty cycle in the parity game arena 
A, i.e. so £ s n E and n > 0. x !S ^-dominated, i.e. the highest color in x is even if and 
only if 'p(x) >- 0- X ' s ^-dominated if and only if p(x) ~< 0- 

Now, for a given parity game arena A let ctq, a\ be the optimal winning strategies 
of player 0, resp. 1. Further, let Wo, W\ be the corresponding winning sets. Obviously, 
both players can still use these strategies in A±, too, as we only added additional edges. 
Especially, player can still use Ctq to win Wq in A± as only he has the option to move 
to _L In the case of player 1, by applying a\ any cycle in *4±|cr* reachable from a 
vertex v £ W\ has to be odd-dominated. Hence, player prefers to play in an acyclic 
path from v to _L in ^4._l| ct1 when starting from a vertex in W%. 

Let therefore be p the -i -maximal value of any acyclic path terminating in _L in A±. 
p is the best player can hope to achieve starting from a node v £ W\ when player 
1 plays optimal. We therefore define: player wins a play tt, if p(n) >- p, otherwise 
player 1 wins the play. Player i wins a node s £ V, if he has a strategy a C Ei with 
which he wins any play starting from s in A\ a - As already sketched, this leads then to 
the following theorem. 

Theorem 2. Player i wins the node s in A iff he wins it in Aj_. 

4 Strategy Improvement 

We now turn to the problem of finding optimal winning strategies by iteratively valu- 
ating the strategy, and determining from this valuation possible better strategies. The 
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following section can be seen as the generalization of the algorithm in ||3) to non- 
deterministic strategies and explicitly stated in the setting of parity games. In fact, we 
will only consider a special class of strategies for player 0, i.e. such strategies which do 
not introduce any 1-dominated cycles. The strategy improvement process will assure 
that no 1-dominated cycles are created. If there are any 1-dominated cycles in .4_l|vi, 
then player 1 wins all the nodes in the 1-attractor to these cycles. We may, thus, identify 
the nodes trivially won by player 1 in a preprocessing step, and remove them. 

Assumption 1. The arena *4j_|vi has no l-dotninated cycles. 

Definition 1. We call a strategy a C Eq of player reasonable, ;/ there are no 1- 
dominated cycles in A±_\ a . 

Remark 1. (a) By our assumption above the strategy a± := Vq x {_L} is reasonable, as 
every 1-dominated cycle in A consists of at least one node controlled by player 0. (b) 
Let a be any strategy of player 0, and W a the set of nodes won by a. Then, the strategy 
a' = a n (W a X W a ) U {(s, J.) | s G V Q \ W a } is reasonable with W a = W<?>. 

We may thus assume that player uses only reasonable strategies. 

Definition 2. Let a be some reasonable strategy of player 0. Its valuation V a : V U 
{_L} — > V maps every node s on the ^-minimal value V a {s) which player 1 can guar- 
antee to achieve in any play starting from s in A±\ a by using some memoryless strategy: 

V<t( s ) : = mm max{p(7r) | tt is a play in A±\ a , T A 7r(0) = s}, 



where we set V CT (_L) := 0. 

Remark 2. (a) We will show later that, if we start from the reasonable strategy <r± := 
Vq x {_L} , then our strategy-improvement algorithm will only generate reasonable 
strategies. (Note, if yljjcr^ had 1-dominated cycles, then these would need to exist 
solely in A\v x - but we have assumed above that we removed those in a preprocessing 
step.) (b) As shown above, for all s £ W\ player 1 can use his optimal winning strategy 
a\ from the parity game to guarantee Vo-(s) < p -i oo. 

By means of the valuation V a we can partially order reasonable strategies in the natural 
way: 

Definition 3. For two (reasonable) strategies a ai o\> of player we write a a ^ ct&, ;/ 
Va a (s) ^ V ab (s) for all nodes s. We write a a -< a^, if there is at least one node s such 
that V aa (s) -< V ab (s). Finally, a a w (Tb, if a a -< A a\, < a a . 

The following lemma addresses the calculation of V ff using a straight-forward adaption 
of the Bellman-Ford algorithm: 

Lemma 2. Let a C E$be a reasonable strategy of player We define V± : V"U{_L} — » 
V by Vj_(-L) := 0, and Vj_(s) = oo for all s £ V, and the operator F a : (V U {_!_} — * 



tCEi strategy 



V)^{V(J {_L} -» V) by 



F a [V}(±) :=0 
Fa{V}(s) :=p 



p{s) + mirP{V(t) | (s,t) G E 1 } if s e V u 
p(s) + max^{V(t) | (s,t) G a} if s G V , 
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for any V : V U {_L} -> V. 

Then, the valuation V a of a is given as the limit of the sequence F l a [VjJ for i — > oo, 
a«c/ this limit is reached after at most \ V\ iterations. 

Remark 3. (a) We assume unit cost for adding and comparing color profiles. The time 
needed for calculating V a is then simply given by 0(|V| • |£J|).(b) For every s € V 
there has to be at least one edge (s, t) with Vo-(s) = p(s) + V a {t), as V a = Fa-[Va]- 

Definition 4. We call any strategy a C Eq a direct improvement of a, if o~ C I a . 

Fact 1. Le? a' be a direct improvement of a. Then along every edge (u, v) of Ax \ a > we 
have V a (u) ^ p(u) + V a (v). In particular, we have for any finite path sqSi . . . s;+i in 

Va(so) ^ p(s ) + V (7 (si) r< p(soSl) + V<r(s2) ^ . . . ^ p(s . . . Si) + V a (si +1 ). 

From this easy fact, several important properties of direct improvements follow: 
Corollary 1. If a is reasonable, then any O-strategy a' C I a is reasonable, too. 

Corollary 2. Let a be a reasonable strategy. For a direct improvement a' of 'a we have 
that a -< a'. If a' contains at least one strict improvement of a, then this inequality is 
strict, i.e. a -< a'. 

The preceding corollaries show that starting with an initial reasonable strategy oo, 
e.g. ax, we can generate a sequence ao, <Ji, 02, • • • °f reasonable strategies such that 
Vai(s) di Vcr i+1 (s) for all s G V, if we choose the strategy ct^+i to be some direct 
improvement of o~^. Further, we know, if crj +1 uses at least one strict improvement (s, t) 
of i.e. (s, t) e <Tj +1 n S ai 7^ 0, then we have V ai (s) -< V ai+1 (s), i.e. every possible 
reasonable strategy occurs at most once along the strategy improvement sequence. As 
already shown, we have always V CTj (s) :< p -< 00 for all nodes s G Wi. The obvious 
question is now, if we can reach an optimal winning strategy by this procedure, i.e. is a 
reasonable strategy a with S a = optimal? This is answered in the following lemma. 

Lemma 3. As long as there is a node s G Wq with V a (s) -< 00, a has at least one 
strict improvement. 

Due to this lemma, we know that, if a reasonable strategy a has no strict improvements, 
i.e. S a = 0, then we have V a (s) = 00 for at least all the nodes s G Wo. On the other 
hand, for all nodes s G W\ we always have V a {s) -< p. Hence, by the determinacy of 
parity games, i.e. W\ = V \ Wq, a has to be an optimal winning strategy for player 
0, if S a = 0. By our construction such an optimal strategy a with S a = might be 
non-deterministic. The following lemma shows how one can deduce an optimal deter- 
ministic strategy from such a a. 

Lemma 4. Let a be a reasonable strategy of player in Ax, ond I„ the strategy 
consisting of all improvements of a. Then every deterministic strategy a' C I a with 
Vi a (s) = p(s) + Vi a (t) for all (s, t) G a 1 satisfies V\ a = V a >. 
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Starting from a± = {(s, _L)|s € Vq}, if we improve the current strategy using at least 
one strict improvement in every step, we will end up with an optimal winning strategy 
for player 0. As in every step the valuation increases in at least those nodes at which a 
strict improvement exists, and as there are at most + l) d possible values a valuation 
can assign a given node, the number of improvement steps is bound by \ V\ ■ (^0- + l) d . 
The cost of every improvement step is given by the cost of the calculation of V a , we 
thus get: 

Theorem 3. Let uq be some reasonable 0-strategy. By iteratively taking Ui+i to be 
some direct improvement of Oi which uses at least one strict improvement, one obtains 
an optimal winning strategy after at most \V\ ■ {^j- + l) d iterations. The total running 
time is thus 0(\V\ 2 ■ \E\ • + l) d ). 

4.1 All Profitable Switches 

In the previous subsection we have not said anything about which direct improvement 
should be taken in every improvement step. As no algorithms are known which de- 
termine for a given strategy such a direct improvement that the total number of im- 
provement steps is minimal (we call such a direct improvement "globally optimal"), 
one usual resorts to heuristics for choosing a direct improvement, (see e.g. [Q]). Most 
often the heuristic "all profitable switches" mentioned in the introduction is used. In the 
case of non-deterministic strategies this simply becomes taking I a as successor strat- 
egy. The interesting fact here is that I a is a "locally optimal" direct improvement for a 
given reasonable strategy a, i.e. for all strategies a' C I a we have a' -< I a . We remark 
that this has already been shown implicitly by Schewe in ||T3l : 

Theorem 4. Let a be a reasonable strategy with I a its set of improvements. For any 
direct improvement of a we have a' ;< I a . 

We like to give an easy proof for this theorem. We first note the following two properties 
of the operator F a : 

Fact 2. (i) For V, V : V U {_L} -> V with V <V we have F a [V] r< F a [V'}. 
(ii) For two ^-strategies a a C ah we have F aa [V] (s) ^< F ab [V] (s) for all s G V. 

Using (i) and (ii) we get by induction 

Fif [Vs.] = F aa [F* a [Vs.]] 1 F aa [Fl b [VJ] < F ab [F^ [VJ] = F^ 1 [VJ, 
and therefore the following lemma: 

Lemma 5. If ' a a and are reasonable and a a C er b , it holds that V aa ^ V ab . 

Now, as the set of improvements I a of a given reasonable strategy a is itself a (non- 
deterministic) strategy, and every direct improvement a' of a satisfies a' C I a by defi- 
nition, the theorem from above follows. The algorithm of Schewe in [13] can therefore 
be described as an optimized implementation of non-deterministic strategy iteration us- 
ing the "all profitable switches" heuristic. 
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We close this section with a remark on the calculation of Vi a . Schewe proposes 
an algorithm for calculating Vi a which uses V a to speed up the calculation leading to 
0(|-B| log | V | ) operations on color-profiles instead of 0(|£7| • |V|). For this, formu- 
lated in the notation of our algorithm, he introduces edge weights w(u, v) := (p(u) + 
Vo-(v)) — V a {u), and calculates w.r.t. these edges an update 5 = Vi a — V a . We argue 
that one can use Dijkstra's algorithm for this, as we have V a (u) < p(u) + V„(v) along 
all edges (u, v) G I a , and thus w(u, v) >z 0, i.e. all edge weights are non-negative. 

Proposition 1. \>j a can be calculated using Dijkstra's algorithm which needs 0(\V\ 2 ) 
operations on color-profiles on dense graphs; for graphs whose out-degree is bound by 
some b this can be improved to 0(b ■ \V\ ■ log | V|) by using a heap^ 

This gives us a running time of 0(|U| 3 - (^44) d ), resp. 0(|U| 2 -64og |U| • (^ + l) rf ). 



5 Comparison with the Algorithm by Jurdzinski and Voge 

This section compares the algorithm presented in this article with the one by Jurdzinski 
and Voge |T5| . We first give a short (slightly imprecise) description of the algorithm in 
|fT31 : This algorithm starts in each step with some deterministic O-strategy a. Using a 
a valuation fl a is calculated (see below for details about il a ). Then, by means of this 
valuation possible strategy improvements are determined, and finally some non-empty 
subset of these improvements is chosen, but only one improvement per node at most, 
such that implementing these improvements yields a deterministic strategy again. This 
process is repeated until there are no improvements anymore w.r.t. the current strategy. 

The valuation Q a : We present a slightly "optimized" version of the valuation used in 
|fT31l . The valuation S7 a (s) of a deterministic O-strategy a consists of the the cycle value 
z a (s), the path profile po-(s), and the path length l a (s) which are defined as follows: 

- As a is deterministic, all plays in A\„ are determined by player 1. For every node z 
having odd color, we can decide whether there is at least one cycle in A\„ such that 
this cycle is dominated by z. Let Z be the set of all odd colored nodes dominating 
a cycle in A\ a - 

Given a node s we define z a (s) to be a node of maximal color in Z, which is 
reachable from s in A\ a ; if no node in Z is reachable from s in A\ a , then s has to 
be won by player 0, and we set z a (s) = oo. 

- If z a {s) is some odd colored node, the second component p CT (s) becomes the color 
profile of a -(-minimal play from s to z a (s) in A\ a - with the restriction that only 
nodes of color > c(z a (s)) are counted. 

- Finally, if p a (s) is defined, l a (s) is the length of shortest play from s to z a (s) w.r.t. 
p cr (s), if z a (s) has odd color. 

1 In (3) the authors propose another optimization to speed up the calculation of V<j by restrict- 
ing the re-calculation of V CT to only those nodes where V a changes. Those nodes can be easily 
identified by calculating an attractor again in time 0(\E\). Unfortunately, combining this op- 
timization with the one by Schewe ( 11131 ) does not lead to a better asymptotic upper bound. 
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Remark 4. We assume here that z a is either oo, if s is already won using a, or the 
"worst" odd-dominated cycle into which player 1 can force a play starting from s. In 
|fl5l , the authors even try to optimize z a (s) when s is already won using a. These 
improvements are obviously unnecessary, as we can always remove the attractor to 
these nodes from the arena in an intermediate step in order to obtain a smaller arena. 

Further, it is assumed in |fT31l that every node is uniquely colored. Therefore, in 
lfT31l p a is defined to be the set of nodes having higher color than z a (s) on a "worst" 
path from s to z a (s). Jurdzinski and Voge already mention at the end of 11511 that their 
algorithm also works when not assuming that every vertex is uniquely colored, but do 
not present the adapted data structures needed in this case. This was done in [2]: If the 
same color is used for several vertices, it is sufficient to only count the number of nodes 
having a color k > z a (s) along such a "worst" path from s to z a (s) where "worst" path 
simply means a -(-minimal path then. Therefore, the color profiles used in this article 
are a direct generalization of the path profiles used in ||T5l . 

In |fT31 an edge (s, t) <E Eq is now called a strict improvement over (s, er(s)), if f2<j(t) 
is strictly better than f2 a (a(s)), i.e. either the "worst" cycle improves, or the worst 
play to it improves, or the length of a worst play becomes longer ("the longer player 
can stay away from z a the better for him"). A deterministic strategy a' is then a direct 
improvement of a given deterministic strategy a w.r.t. 03], if it differs from a only in 
strict improvements. 

Definition 5. For a given parity game arena A = (V, E, c, o), set 

A 1 - := (VU{±},EUV x {±}U{(±,±)},cU{(±,-l)},oU{(±,0)}). 

A results from A± by simply adding a loop to _L, and giving _L the color — 1 so that 
A is a parity game arena where _L is the cycle dominated by the least odd color. A 
straight-forward adaption of the proof of Theorem|2]shows that player wins a node s 
in A iff he wins it in A 1 - . 

Now, as the strategy improvement algorithm in lfT31 tries to play to the "best" possi- 
ble cycle, an optimal strategy (obtained by the algorithm) will always choose to play to 
_L from a node s, if s cannot be won by player 0, as every other 1-dominated cycle has 
at least 1 as maximal color. A strategy a of player is therefore "reasonable" w.r.t. to 
the algorithm by Jurdzinski and Voge, if (_L, _L) is the only 1-dominated cycle in ^4^1^. 

Obviously, we now have an one-to-one correspondence between reasonable strate- 
gies o~ in A±, and reasonable strategies a in _4 J - of player 0: we simply have to remove 
or add the edge (_L, _L) to move from ^4^ to A 1 - and vice versa. We therefore may 
identify these strategies in the following as one strategy. 

This allows us to compare the improvement step of the algorithm presented in this 
article with that of |15 1. Indeed, as the color of _L is —1 (recall that all other nodes have 
colors > 0), we have p CT (s) = V CT (s) for all nodes with z a (s) = _L, and V a (s) = oo, if 
z a {s) 7^ -L. This proves the following proposition: 

Proposition 2. Any (deterministic) direct improvement a' of a identified by H15V is a 
subset of la-. Therefore a' ^ I a . 



10 



In other words, the algorithm presented here always chooses locally a direct improve- 
ment of a which is at least as good as any deterministic direct improvement obtainable 
by ifTBI . In the appendix, a small example can be found illustrating this. 

5.1 Bound on the number of Improvement Steps 

We finish this section by giving an upper bound on the total number of improvement 
steps when using the "all profitable switches"-heuristic. In the case of an arena with out- 
degree two, one can show that the number of improvement steps done by the algorithm 
in fl3J is bounded by O(C^) (cf. HI). 

When considering non-deterministic strategies the heuristic "all profitable switches" 
naturally generalizes to simply taking I a as successor strategy in every iteration. Here 
we can show the following upper bound: 

Theorem 5. Let A± be a escape-parity- game arena where every node of player has 
at most two successor. Then the number of improvement steps needed to reach an op- 
timal winning strategy is bound by 3 ■ 1.724' v ° when using non-deterministic strategy 
iteration and the "all profitable switches " -heuristic. 

Remark 5. To the best of our knowledge this is the best upper bound known for any 
deterministic strategy-improvement algorithm. In H] a similar bound is only obtained 
by using randomization. 



6 Conclusions 

In the first part of the article, we presented an extended version of the algorithm by UJ 
which (i) allows the use of non-deterministic strategies, and (ii) works directly on the 
given parity game arena without requiring a reduction to a mean payoff game as an in- 
termediate step. For (ii), we used the path profiles introduced in 0151 . resp. a generalized 
version of it called color profiles (see also 12). 

We then showed that the heuristic "all profitable switches" in the setting of non- 
deterministic strategies leads to the locally best direct improvement, and therefore to 
the algorithm presented in (T3J . We further identified the fast calculation of the valuation 
proposed by Schewe as the Dijkstra algorithm. 

Finally, we turned to the comparison of the algorithm presented here to the one by 
Jurdzinski and Voge 03] ■ As our algorithm works directly on parity games in contrast 
to H3I13I . we could show that the valuations used in both coincide for parity game arenas 
with escape for player 0. We finished the article by adapting results from iflOl which 
allowed us to show that using the "all profitable switches"-heuristic in the setting of 
non-deterministic strategies allows to obtain an upper bound of 0(1.7241^1) on the 
total number of improvement steps. This bound also carries over to the algorithm in 
[131. This bound was previously only attainable using randomization |fl]. 
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A Example: Comparison with the Algorithm by Jurdzinski and 
Voge 




a) depicts an arena A 1 - where bold arrows represent the edges of a O-strategy a, and 
dashed arrows represent edges not included in a. Further, all nodes belong to player 0, 
where the numbers inside the nodes represent the colors, b) shows the set S a of strict 
improvements w.r.t. a. c) The heuristic applied usually for choosing a deterministic 
direct improvement of a is to take a maximal subset of S a so that for every node, for 
which a strict improvement exists, there is exactly one strict improvement chosen. In 
this example this leads to the strategy depicted in c). d) The algorithm presented in this 
article, on the other hand, chooses the non-deterministic strategy I a = crU S^, as shown 
in d). e) Calculating the valuation of both I a , and the strategy shown in e) shows that 
both strategy are equivalent w.r.t. their valuation (see also lemma |4). This means the 
strategy I a is already optimal in difference to c). 

B Missing Proofs 
B.l Preliminaries 

Definition 6. Given an arena A = (V,E,o) and a target set T C V of nodes, we 
define the i-attractor Attri [A] (T) to T in A by 

A :=T 

A i+1 := A, U {s e Vi\sE ni^8}u{se Vt-ilsE C Ai} 

Attr [A](T) :=Ui>„^- 

The rank r{s) € N U {00} of a node s w.r.t. to Attr [A\ (T) is given by 

min{i G N|s € AJ 

where we assume that min = 00. 

A strategy a C E{ is then an i-attractor strategy to T, if for every (s, t) G a the 
rank decreases along (s, t) as long as s has finite, non-zero rank. 

Remark 6. Obviously, player i can use any i-attractor strategy to force any play starting 
from a node with finite rank into T on an acyclic path as the rank is strictly decreasing 
until T is hit. 
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B.2 Parity Game Arenas with Escape for Player 



Lemma[TJ Assume that \ = sqSi . . . s n is a non-empty cycle in the parity game arena 
A, i.e. so € s n E and n > 0. x ' s ^-dominated, i.e. the highest color in x is even if and 
only if p(x) ^ 0- X & 1- dominated if and only if p(x) ~< 0- 

Proof. Wlog. we may assume that s$ has the dominating color in x- As a ^ remaining 
nodes in x nave at most color c(sq), the color profile p(x) is for all colors > c(sq). 
Hence, the highest color in which p(x) and differ is c(s). If c(s) is even, then p(x) >- 
by definition, otherwise p(x) ~< 0, as p(x)c(s ) > 0- The other direction is shown 
similarly. □ 

Theorem[2j Player i wins the node s in A iff he wins it in A±. 

Proof. Let a* be the optimal, memoryless winning strategy in the parity game A, and 
Wi the winning set of of player i w.r.t. a*. 

First consider the case s G Wq. As only player can choose to move to _L, any play 
tt in A± w.r.t. <7q is a play in A, too. Hence, it is infinite, and won by player w.r.t. the 
parity game winning condition. Thus, tt has the value oo. 

Assume now that s E W\. Player 1 can use his optimal strategy to force player 
starting from ,s into a play such that every cycle visited is 1-dominated. If player does 
not move to _L, the infinite play also exists in the original parity game arena, is therefore 
won by player 1, and, hence, has the value — oo in the escape game. On the other hand, 
in the escape parity game player has now the option to escape any such infinite 
play by opting to terminate the game by moving to _L. Consider therefore a finite play 
7r = sosi . . . s„_L. Assume that this path is not acyclic. Thus, as we are only counting 
how often a given color appears along the path, we may split tt into a simple path tt' 
from so to T and several cycles xi> ■ ■ ■ > XI- By using his winning strategy a\ player 1 
can make sure that every such cycle has an odd color as maximal color. It is now easy 
to see that p(xj) -< by definition of -<. Thus, we have 

p(n) = p(tt') + p(xi) + ... + p(xi) ~< p(tt') d P- 

□ 



B.3 Strategy Improvement 

LemmaH Let a C Eq be a reasonable strategy of player 0. We define Vj_ : yu{_L} — > 
V by Vx(_L) :~ 0, and Vj_(s) — oo for all s £ V, and the operator F a : (V U {_L} — > 
V) ^ {V U {L} ^ T) by 

FAV](±) :=0 

F a [V}( S ) := p(s)+mm±{V(t) \ {s,t) e ^i} ifs G V u 
F a [V](s) := p(s) + max-{V(t) | (s, t) G a} ifs€V , 

for any V : V U {_L} — > V. 

Then, the valuation V a of a is given as the limit of the sequence [VjJ for i — > oo, 
and this limit is reached after at most \ V\ iterations. 
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Proof. For all V, V : V U {_L} -> 7> with V(s) ^ V'(s) for s G V U {_L} we have 
F CT [V](s) ^ F a [V'](s), too, i.e. F CT is monotone. Obviously, we have i ;l cr [Vj_](s) ^ 
Vj_(s) for all s € V U {J-}. Therefore, F^[V^](s) is monotonically decreasing for 
i — ► oo. 

As a is reasonable, Vcr(s) > oo, and it can only be finite, if s is in the 1-attractor 

to _L in *4jJ<j. Further, for V cr (s) -< oo, V CT (s) has to be the value of an acyclic play 
7r in wA^lcr. One therefore checks easily that V a is a fixed point of F a ; hence, by the 
mono tonicity of F a , and V a < Vj_, we have V a ;< F l [V±] for all ieN. 

Let Cj be the set of nodes s G V U {±} such that F*[Vj_](s) = V CT (s). Obviously, 
we have _L G Cj for all i G N. As ^[VjJ is monotonically decreasing, and bounded 
from below by Vc, we have Ci C Cj+i. 

Define £?,; to be the boundary of C i.e. the set of nodes s G V \ Ci with sE n C 7^ 
A s£ D V \ Ci + 0. 

If Si C Vo, then player has a strategy to stay away from _L G C for every node 
s G V \ Ci. It is easy to see that J 7 ' i [V_l](s) = 00 for all s G V \ Ci in this case. 

Thus, assume Bi D V\ 7^ 0. As player 1 eventually needs to enter C in order to 
reach _L, he has to use an edge from a node s' G V\ fl -Bj to Cj. At least for this node s 1 
we have to have s' G Q+i- 

Hence, we have to have Cj = V for some i < \V\, implying .F* +1 [Vj_] = F*[Vj_]. 

□ 

Definition 7. We write t ct C E\ for the l-strategy consisting of the edges (s, t) with 

Va(s) = p( S )+Va(t). 

CorollaryUJ If a is reasonable, then any direct improvement a' of a is reasonable, too. 
Proof. For any cycle sqsi . . . sj with so G siE a , we have 

V a (s ) < p(s ... si) + V„(s ), i-e. -< p(s . . . s t ). 

□ 

Corollary[2j Let a be a reasonable strategy. 

(a) For a direct improvement a 1 of a we have that V a {s) ^ V a > (s) for all s G V. 

(b) If(s,t) G a' is a strict improvement of a, thenV a (s) -< V a \s). 

Proof, (a) Let s be any node. For any play it = sqsi . . . s„_L starting from s in ^4j_ | CT 
we have already shown: 

V CT (s) 1 P(vr) + V CT (±) - p(tt) ^ V CT '(s). 

(b) As (s, t) is a strict improvement of a, we have (i) V a {s) -i p(s) + V a (t), (ii) s G Vo, 
and, hence, (iii) V CT '(s) = max^{p(s) + V CT '(t') | (s,t') G <r'}. With the result from 
(a) it follows that 

V CT (s) -< p(«) + V*(t) ± p{s) + V„>(t) ^ 

□ 
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Lemma [3j As long as there is a node s £ Wq with V a (s) ~< oo, a has at least one 
strict improvement. 

Proof. Let A be the set of nodes t with V a (t) -< oo, i-e. A is the 1-attractor to _L in 
^4_l|ct- By assumption we have Wo fl A ^ 0. Assume s e 4(1 Wo- Let 7r be any play 
determined by r CT and <7g. As Cq is optimal and s G Wo, tt stays in Wo forever, i.e. the 
play is infinite. 

First, assume tt does not leave A. Every time it uses an edge (u, v) which does not 
exist in Aj_\ a it has to hold that u G Vq. Hence, as a is not strict improvable, we have 
to have V a {u) > p(u) + for all edges (u, v) £ Oq. On the other hand, we have 

Vcr(u) = p(u) + V a {v) along edges (u, v) £ r a . Thus, the value of any cycle visited 
by 7r is -< - a contradiction. 

Therefore, consider the case that 7r leaves A. This also has to happen along an edge 
(it, i>) with m <E Vq. As u £ A and i> £ V \ A we have V CT (u) -< oo = Vcr(«). Hence, 
(u, u) is a strict improvement. □ 

Lemma |4j Lef a be a reasonable strategy of player in A±, and I a the strategy 
consisting of all improvements of a. 

Then every deterministic strategy a' C I a with Vi a (s) = p(s) + Vi a {t) for all 
(s, t) £ a' satisfies Vj a = V a >. 

Proof. By definition, a' is a direct improvement of I a , hence, we have \>i a (s) -< V<j> (s) 
for all nodes s. 

On the other hand, a' is also a direct improvement of er, as a 1 C !„. Thus, we have 
Va'(s) d V/„(s) for all s G V. □ 

Lemma 6. (a) For a a and two reasonable strategies of player 0, we define the strat- 
egy cr a b by 

(s,t) £ a ab max{V <To (s),V - i ,(s)} ^ p(s) +max{V<r a (i), V„ b (t)}. 

Then m&x^{V& a (s), (s)} ^ V<T ab (s)/e>r all s £ V, i.e. there is a strategy a 
such that for all other strategies a we have V a (s) ^ Vg-(s) for all s £ V. 
(b) IfV a (s) -< Va(s) for at least one s £ V, then a has a strict improvement. 

Proof, (a) We first show that <7 & is indeed a strategy. Consider any s £ Vq. Then there 
is at least one t a s.t. (s, t a ) £ a a and V aa (s) = p(s) + V aa (to), and similarly a % with 
the same properties w.r.t. a^. Assume V CTa (s) < V ab (s) - the other case being similar. 
By definition of V a we then have 

Va a {tb) ^V CTa (t„) =p(s)+V aa (s) r< p(s)+V CT6 (s) = V fft (t&), 

i.e. (s, *b) G CT a& . 

By definition, we have 

max{V CTa ( S ),V CTb ( S )} r< p(«) +max{V CTo (s),V CT6 (s)} (*) 
along every edge (s, i) G er a b- For any edge (s, i) G E±, we have 

V CTa (s) d P(s) + V„ a (t) and V CT6 (s) ^ p(s) + V CT6 (i). 
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Hence, (*) holds along every edge of -4_i_|<7 ai) . Therefore, any cycle in ^4_l |o- a6 has to be 
O-dominated, again, i.e. <7 (, is reasonable, too. 

If Va ab (s) = oo, there is nothing to show. Assume V (Jo6 (s) -< oo, first, and let 
7r = sqSi . . . s„_L be any acyclic play with p(ir) = V aab (s), Because of (*) we then 
have max{V CTo (s),V - ! ,(s)} < p(ir) = V aab (s), again. 

(b) If there is some node s G V with V a (s) -< Vs-(s) = oo, we already know that a 
has a strict improvement as it is not optimal (s is won by a but not by a). 

Therefore assume that Va-(s') = oo implies V a (s') = oo for all nodes s', and let s be 
a node with Vo-(s) -< oo. Let tt again be an acyclic play in A± | a-. Ta with p(ir) = Va-(s), 
i.e. player uses a and player 1 his response-strategy r a for a. 

As a has no strict improvements, we have V ff (s) >z p(s) + V a {t) for all edges 
(s, t) G Eq\ on the other hand, along the edges (s, t) G T a we have V a (s) = p(s) + 
V a (t) by definition of r CT . 

Hence, we get Va-(s) t V CT (s) ^ p(tt) = Vs-(s), if cr has no strict improvements. 

□ 

Proposition [TJ Vj a can be calculated using Dijkstra's algorithm which needs 0(|y| 2 ) 
operations on color-profiles on dense graphs; for graphs whose out-degree is bound by 
some b this can be improved to 0(b ■ \V\ ■ log |V|) by using a heap. 

Proof. Let cr be a reasonable a strategy of player 0, and A the 1 -attractor to _L in ^4^ | i a . 
For all nodes s G V \ A, we have Vi a (s) = oo. We therefore have only to consider the 
graph (A, Ej a n A x A) in order to calculate Vi a for the nodes in A. 

Recall that we have for every edge (u,v) in *4jJ/ CT that V a (u) < p(u) + V a {v). 
Define now for (u, v) G Ei a Ax A the function w by w(u, v) :— (p(u) + V a (v)) — 
Vo-(u) >z 0- Hence, for any path tt' = toti . . . t n ±. in (A, Ej a n A x A) we have 

P(7T')-V a (t ) 

= P {t ) + ... + P {t n ) + (v CT (ti) - v a (h)) + ... + (v a (t n ) - vm) - v a (t ) 

= (p(t ) + V a (h) - v CT (t )) + ■ • • + (p(tn) + v a (±) - V CT (t„)) 

= u>(*o,*i) + w{t 1 ,t 2 ) + . . . + w(i„,_L). 

Therefore, for any s G A we have that V/ CT (s), i.e. the -<-minimal value player 1 can 
guarantee to achieve in a play starting from s, has to be Vcr (s) plus the -< -minimal value 
(Jo- (s) player 1 can guarantee starting from s in the edge-weighted graph (A, Ej a nix 

A, w). 

As ui(u, v) >: 0, we can use Dijkstra's algorithm to find <5o-(s) with the restriction 
that we only may add a node controlled by player to the boundary in every step of 
Dijkstra's algorithm, if all successors of this node have already been evaluated. We then 
have ^ (s) = Vi„(s) - V„(s). □ 

B. 4 Comparison with the Algorithm by Jurdzinski and Voge 

Theorem |5j Let A±_ be a escape-parity-game arena where every node of player 
has at most two successor. Then the number of improvement steps needed to reach an 
optimal winning strategy is bound by 3 • 1.724J y °l. 



17 



Proof. Assumption 2. We assume that player can only choose between at most two 
different successors in every state controlled by him, i.e. Vw € Vq : \vE\ € {1, 2}. 

Let (a± = Co) -< 0\ < ... < (cti = 5") be the sequence of strategies produced 
by the strategy-improvement algorithm presented in this article. As already shown, we 
may assume that <7j is deterministic. 

For <7j let hi be the number of nodes s g Vb such that there is at least one strict 
improvement of a at s, i.e. 

fc, := |src(5 CT J| with src(S CT J := {s £ V \ 3(s,t) e 5 CT J. 

(Recall that is defined to be the set of strict improvements of a given strategy a.) 

Then there are at least 2 ki — 1 deterministic direct improvements cr' of cr^ with 
(Tj -<; cr' and cr' \ cr.; C 5^.0 

We then have cr; < a' < cn+i for every such cr'. Now, as cr; -< ai+i, we know that 
every such cr' has not been considered in a previous step (< i) nor will it be considered 
in any following step (> i). Therefore, at least 2 ki — 1 new deterministic strategies can 
be ruled out as candidates for optimal winning strategies. 

Hence, if Sk is the number of deterministic strategies which have at most k nodes 
at which there exists at least one strict improvement, we get as an upper bound for the 
number of improvement steps 

ol^ol 

The next lemma bounds the number Sk t of strategies Oi having the same value for kf. 

Lemma 7. Let (crj) <i</ = ct_l = < o\ < ■ ■ ■ < o~i = o be the sequence of 
reasonable deterministic strategies generated by the strategy improvement algorithm. 

For an arena A± with \sE\ < 2 for all s E Vq it holds that there are most ('Y? ) 
strategies in {cri)o<i<i with \src{S ai )\ = k'. 

Proof. First note the following easy fact: As along any edge (s, t) G a holds, we have 
Vcr(s) >z p(s) + Ver(i) by definition of F a . Thus, for any strategy a C Eq of player 
it holds that 5 CT H a = 0. 

Next, let a a and cr& be two reasonable strategies of player in Aj_. We claim that it 
holds that 

(a) If S ab n a a = 0, we have a a ^ erf,. 

(b) Assume that \sE\ < 2 for all s € V . If src(5' (T J C src(5 CTa ), it holds that a a ^ Ob- 

Before given the proofs to these two claims, note that (b) already implies that we can 
have at most ('^?')-many strategies Ui with ki = k', as this is the number of disjoint 
subsets of Vq with k' distinct elements. 

In order to show (b), we first need to show (a): (a) Let A' ± be the arena resulting 
from Aj_ by removing all strict improvements of <7& from E, i.e. E' = E \ S ab . Both a a 
and <7b are reasonable strategies of player in A' ± , as we only remove edges and these 

2 Note that we do not claim that <7;+i is one of these strategies a' . 



18 



edges are neither used by a nor by a'. This also means that the operators F a and F a / 
stay unchanged, implying that the valuations of a (reps, a') on A± and A'j_ coincide. 
But as at, has no strict improvements in A' ± , it has to hold that <7b is an optimal winning 
strategy in A' ± , meaning that a a < <j\> (cf. lemma[6]). 

(b) Set C = S ab n <j a . For every s G src(C) we find a tc such that (s, tf ) £ C, 

S S 

a ij b with {s,t° b ) € <Jb (as <7b is a strategy), and a t a " a with (s,^"") G S' (Ta (as 
src(5 CT J C src(5 CT J). 

Now, because of 5 CT n <r = for any strategy a, we may conclude that t c ^ t^ b , 
and t c 7^ tf" for all s € src(C). Thus, as we assume that \ sE\ < 2, it has to hold that 
ts" a = t a s b for all s G src(C). We define therefore C = {(s, t a s b ) \ s G src(C)}, and 

tr' := C'U<j a \ C. 

As C" C Sa a , we have cr a ^ a'. Further a' ^ CTb, as a' n 5 CTb =0. □ 

The last lemma can be found in iflOl for Markov decision processes. 
As long as 1 < k < ^p, we have 

Vb|\ ^J\V \\ ^ n f\v \ 



k'=0 



s ,<e r: s> 7 ^ 



k' ) - ~\ k ) - " \ k e 



What remains is to find a 1 < fc < such that 
is minimal. For this set 6 = pp with 6 > 3, yielding 

As 1+ ^ nfe is strictly decreasing and is strictly increasing, we need to look for the 
largest b > 3 such that 

1 + In b , - 1 

Using e.g. Newton's method one can easily check that b G (4.6, 4.7) with 6 « 4.66438. 
We therefore get 

3 . e o.545-|v | < 3 . ij24l l/ °l < 3 • 1.313 |v| 

as an alternative upper bound for the number of improvement steps for an arena with 
out-degree two 0. □ 



3 Using a more detailed analysis in the spirit of Q~) one can even show an upper bound of 

O(1.71 |vo1 ). 
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